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Abstract .—A binary phylogenetic network may or may not be obtainable from a tree by the 
addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily 
tested criterion (based on ‘2-SAT’) that efficiently determines whether or not any given net¬ 
work can be realized in this way. Moreover, the proof provides a polynomial-time algorithm 
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questions and observations, which we outline in the conclusion. 

(Keywords: Phylogenetic network, 2-SAT, antichain, phylogenetic tree, algorithm, reticulate 
evolution) 


Introduction 


Starting from any rooted binary phylogenetic tree, if we sequentially add one or more arcs 
(directed edges), each placed from a point on one tree arc to a point on another tree arc, then 
provided no directed cycles arise, we obtain a rooted binary phylogenetic network. Many 
classes of phylogenetic networks can be generated in this way, even if, at first, their descrip¬ 
tions seem somewhat different. For instance, hybridization networks are usually produced 
by adding two arcs from points on tree arcs to meet at a new hybridization vertex, with a 
further arc leading to a hybrid offspring; however, an equivalent network can be produced 
by starting with a phylogenetic tree and simply adding arcs just between tree arcs. 


Here, we explore a key observation due to Leo van lersel (van lersel 2013), namely that 


not every binary phylogenetic network can be obtained from a tree by simply adding arcs 
between tree arcs. In this paper, we provide a precise mathematical characterization of the 
networks that can be obtained in this manner. This in turn allows us to readily show that 
certain classes of networks are tree-based, while others are not. We then describe an efficient 
algorithm for determining whether any given network is tree-based and finding possible 
trees from which to build the network. We illustrate the use of this algorithm on a recent 
phylogenetic network concerning the complex hybrid evolution of wheat. 

Informally, we say that a binary phylogenetic network is a ‘tree-based network’ if it can be 
obtained from a rooted binary phylogenetic tree by sequentially attaching arcs between the 
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arcs of the tree. This concept is relevant to the question of whether phylogenetic networks 
can be viewed as really just trees with some reticulate arcs between the branches or whether 
some networks are inherently less tree-like, so that the concept of an ‘underlying tree’ may be 
meaningless. This is particularly relevant to the ongoing debate about whether the evolution 
of certain groups (e.g. prokaryotes) should be viewed as tree-like with reticulation or whether 


the very notion of a tree should be dispensed with (Dagan and Martin 2006; Doolittle and 


Bapteste 2007 Martin 2011). A network that is not tree-based cannot be described as tree¬ 


like evolution with directed links between the branches of the tree (at least for the taxa 
under study - the existence of unsampled or extinct taxa (Szolldsi et ah 2013) can alter this 
conclusion, as we show). Conversely, a network that is tree-based can still allow for genuine 
reticulation events such as the formation of hybrid taxa from two ancestral lineages. 

Phylogenetic networks can be viewed as providing either an ‘explicit’ picture of reticu¬ 
late evolution or as giving an ‘implicit’ representations of conflict in the data {c.f. Huson 


et al. (2010), p.71). In the explicit setting, vertices having two incoming arcs correspond to 


hypothesised reticulate evolutionary events such as hybrid evolution, endosymbiosis, and lat¬ 


eral gene transfer (either individual transfers, or ‘highways’ of lateral gene transfers (Bansal 


et al. 2013)). In the ‘implicit’ setting, the networks are frequently unrooted, as in the pop¬ 


ular ‘NeighborNet’ method (Bryant and Moulton 2003), and the degree of reticulation is a 
measure of the extend to which trees constructed from different loci (‘gene trees’) disagree 
with each other, even though the evolution of the taxa may be essentially tree-like (such 
networks can also help identify true reticulation when it is present ( Holland et al.| 2008)). 

Conflicts between gene trees arise by well-studied random processes at the interface of 
population genetics and molecular evolution, such as incomplete lineage sorting, gene dupli¬ 
cation and loss, and lateral gene transfer (see 


Knowles and Kubatko 

(2010 

) or 

Szolldsi et al. 


(2015)). In this case, there is is often assumed to be a ‘species tree’, with non-reticulate 
processes (incomplete lineages sorting and gene duplication) occurring within the branches 
of the tree, and with the reticulate process of lateral gene transfer providing linking arcs 
between the branches. Provided the level of random lateral transfers is not too high it is still 


possible to infer a ‘central tendency’ species tree accurately ( 

Roch 

2013 

Steel et al. 

2013), 

as well as correcting conflicting gene trees ( 

Bansal et al. 

2014 

)• 


In this paper, we are concerned with a more basic question arising for explicit phylogenetic 
networks - namely, if one has a rooted binary phylogenetic network, regardless of how this 
may have been obtained, then we wish to determine whether or not it can be described as 
a tree with additional arcs. As we discuss further in the conclusion, the interpretation of 
tree-based requires some care, as there may be other trees that equally well represent the 
network (so any given tree need not be a ‘central tendency’ species tree). 

The structure of this paper is as follows. We hrst provide a precise dehnition of the 
concept of a tree-based network, and then state our main result. After deriving a number 
of consequences from this result, we then show how it leads to a simple algorithm to test if 
a network is tree-based, and we provide a sample application. We then discuss the delicate 
relationship between a network being tree-based and displaying a tree, before concluding 
with a number of observations and some questions. 
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Definitions 


First, we make the definition of a ‘tree-based network’ more precise. Given a set X of taxa, 
a binary phylogenetic network (over X) refers to any directed acyclic graph N = (V, A), for 
which; 


• X is the set of vertices that have out-degree 0 and in-degree 1 (leaves); 

• there is a unique vertex of in-degree 0, called the root (denoted p), which has out-degree 
1 or 2; 

• every vertex other than p or a leaf either has in-degree 2 and out-degree 1, or in-degree 
1 and out-degree 2. 


We say that a binary phylogenetic network X is a tree-based network (with base tree T) if 
N can be described as follows. First, subdivide each arc of T as many times as required, and 
call the resulting degree-2 vertices attachment points and the resulting tree T' a support tree 
(for N derived from T). Next, sequentially place additional arcs between any two attachment 
points, provided that the network remains binary (i.e no two additional arcs start or end 
at the same attachment point) and acyclic (i.e. no directed cycle is created). We call these 
additional arcs linking arcs. Any attachment point that is not incident with a linking arc 
is then suppressed. Notice that this allows for parallel edges to be present in a tree-based 
network (if two attachment points are adjacent in T' with a linking arc between them). 

Requiring a network N to be based on a tree T is a much stronger condition than just 
requiring that N ‘displays’ T. We will explore the relationship between these two concepts 
further in a later section. The interested reader is referred to Huson et ah (2010) for general 
background on phylogenetic networks. 


Some basic observations to note at this point are as follows: 


(i) All vertices of any network that is based on T are vertices of the support tree T' (i.e. 
no new vertices are created, since a linking arc is not allowed to start or end on another 
linking arc). 


(ii) The order in which the additional arcs are attached in converting T' to N is not 
important. 

(iii) A tree-based network can have different possible base trees; for example. Fig. shows 
a binary network on {a, b, c} that can be based on all three of the possible 3-taxon 
trees. 


IV 


Not all binary phylogenetic networks are tree-based, one example (from van lersel 
pM^ ) is shown in Fig. |^i) and another in Fig. [^iii). 
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Figure 1: A tree-based network on three leaves in which all possible trees on three leaves 
could be the base. 


In contrast to this last point, several classes of networks are tree-based. Clearly, horizon¬ 
tal gene transfer networks dehne one such class (since there is a canonical tree associated 
with each such network which contains every vertex of the network ( Francis and Ste^|2015 )) 
but so, too, are tree-child networks, as noted by van lersel (2013) (see Corollary below). 
Since any hybridization network is also a tree-child network, it follows that every hybridiza¬ 
tion network is tree-based (as noted above). Our goal here is to characterize when a binary 
phylogenetic network is tree-based, and provide criteria for deciding whether a given network 
is tree-based, along with an algorithm to determine this. We also explore the subtle rela¬ 
tionship between a network being based on a tree, and the weaker but more widely-known 
notion of the network displaying a tree. 


The main theorem 

Our main theoretical result can be stated informally as follows. The question of whether 
or not a binary network is tree-based can be re-stated as an equivalent question in propo¬ 
sitional logic called ‘2-SAT’, and which is easily solved. To make this precise we introduce 
some additional notation. 

Let N = (E, A) be a rooted binary phylogenetic network on leaf set X. For an arc 
a = {u, v) E A, we say that u is the source and v the target of a, and that a is an incoming 
arc of V and an outgoing arc of v. Let Si be the subset of arcs in A whose source has 
out-degree 1 or whose target has in-degree 1. We say that a subset S' of A is admissible if S 
contains Si and satishes the following two constraints for every v E V: 

(Cl) If V has in-degree 2 then exactly one of its incoming arcs is in S. 

(C 2 ) If V has out-degree 2 then at least one of its outgoing arcs is in S. 

The problem 2-SAT is a classic and easily solved problem in logic to determine whether 
a conjunction of clauses each involving just two literals (or their negation) has a satisfying 
assignment. For example, suppose that, in a court case, witnesses have stated the following 
three opinions as to who may or may not have been involved in a crime: ‘Peter or Susan\ 
‘John or not Peter\ and ‘not John or not Susan\ As an instance of 2-SAT, the satishability 
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(ii) 


(iii) 


Figure 2: Some pertinent examples of binary networks. Example (i) is from van lersel (2013). 
Note that, despite first appearances, (ii) is tree-based via the tree arcs {u,x) and {x,v) (so 
that {u,v) and {w,x) are linking arcs). Neither (i) nor (iii) are tree-based, as can easily be 
checked by Proposition since (i) has an antichain consisting of three vertices but only two 
leaves, while (iii) has an antichain of four vertices but only three leaves. 


question asks whether these three witness statements could all be correct. In this case they 
can, for example, if John and Peter were involved in the crime but Susan was not. With 
these concepts in hand, we can now state the main result, the proof of which is given in the 
Appendix. 

Theorem 1. 


(a) A rooted binary phylogenetic network N = (V, A) is tree-based if and only if there exists 
an admissible subset S of A. In this case S forms the arcs of a valid support tree for 
N and the arcs in A — S are linking arcs. Moreover, there is a bijection between the 
set of admissible subsets S of A and the set of valid support trees for N. 

(b) Determining whether N is tree-based can be restated as a guestion of whether a particu¬ 
lar instance of 2-SAT has a satisfying assignment, and this can be solved in polynomial 
(linear) time. 


An immediate consequence of Part (a) of this theorem is that it is the case that any 
non-tree-based rooted binary phylogenetic network can be expanded to become tree-based 
by the addition of extra arcs and leaves, as the next corollary shows. This is relevant in 
biology because these additional leaves may represent taxa that have become extinct in the 
past, and so can not be sampled today ( |Fournier et ah 2009 Szolldsi et ah 2013), or are 
still extant today but have not been included in the sample of taxa under study. In other 
words, any binary phylogenetic network can be realized as a tree with additional linking 
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arcs, provided that one allows additional ‘unseen’ taxa in the past to play a certain role in 
the evolution of the taxa sampled today. 

Corollary 1. For any binary phylogenetic network N over leaf set X there exists a tree-based 
phylogenetic network over a leaf set that contains X for which N = N~^\X. 

Here N^\X is the restriction of N~^ to those vertices that have a path to at least one leaf 
in X (it is obtained from N~^ by deleting all vertices and arcs that do not lie on a path to a 
leaf in X, and then suppressing any vertices of in-degree and out-degree equal to 1). 

Proof. Write N = (H, A) and let S = A. Then S fails to be admissible only by violations 
of condition (Ci). Thus we can convert S into an admissible subset S' by performing the 
following step for each vertex v oi N that has in-degree 2. Select either one of the incoming 
arcs arriving of n - say, (m, v) - and subdivide this arc and attach a new leaf Xy (specihc to 
v) to the subdivided vertex w. Now remove (m, v) from S and replace it with the arcs (m, w) 
and {w,Xv). Once this step is performed for all vertices of in-degree 2, the set S' of arcs of 
the resulting network N' is admissible and so N' is tree-based. Moreover, N'\X = N. □ 


Necessary and sufficient conditions for tree-based 


We now describe two further ways of characterizing tree-based. These are more immediate 
and of less direct algorithmic relevance, but we state them here as they provide a more 
complete picture of what ‘tree-based’ means. We say that a set of arcs in a directed graph 
is independent if no two arcs in the set share a vertex. The proof of the following result is 
provided in the Appendix. 

Proposition 1. Let N = {V,A) be a rooted binary phylogenetic network on leaf set X. The 
following are eguivalent. 

(a) N is tree-based. 

(b) There is an independent set I of arcs of N for which N' = (V, A — I) is a rooted tree. 

(c) N has a rooted spanning tree (with root p) that contains the arcs in Si and with all its 
leaves in X. 


Next we consider a necessary condition for N to be tree-based, based on the concept of 
‘antichains’. This can provide a rapid way to verify that certain networks cannot be tree- 
based. An antichain in any directed graph is simply a subset A of vertices that has the 
property that there is no directed path in the graph from any one vertex in A to any other 
vertex in A. Let X be a binary phylogenetic network. If, for any antichain A of non-leaf 
vertices in X, there exists at least |A.| arc-disjoint paths from A to the leaf set, we say it 
satishes the antichain-to-leaf property. By a version of Monger’s theorem for disjoint sets of 
vertices in directed graphs (Bohme et ah 2001), the antichain-to-leaf property is equivalent 
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to the statement that for any antichain A of non-leaf vertices in N, at least |^| arcs of N 
must be cut in order to separate A from X. 

The following result, the proof of which is also in the Appendix, provides a necessary 
condition for N to be tree-based; if it fails we know immediately that N cannot be tree-based. 

Proposition 2. If a binary phylogenetic network over leaf set X is tree-based then it satisfies 
the antichain-to-leaf property. In particular, the largest antichain in any tree-based network 
N over X has size exactly |X|. Thus any tree-based network that has a larger antichain than 
the number of leaves cannot be tree-based. 

Proposition provides an easy way to verify that the network in Fig. [^i) is not tree- 
based, since it contains an antichain (the set {u,v,w}) that is larger than the leaf set of the 
network. 

It might seem plausible that the antichain-to-leaf property is also a sufficient condition 
for a network to be tree-based. Alas, this is not the case, and Fig. shows a particular case 
where antichain-to-leaf property holds, yet the network is not tree-based. 




Figure 3: (a) A network that is not tree-based, even though it satishes the antichain-to-leaf 
property. That this network fails to be tree based can be verihed by applying Corollary 
starting with the arcs in Si labelled t (shown in bold in (b)) and applying the conditions 
(Cl)' and (C 2 )' repeatedly, we are forced to label both of the arcs outgoing from v by /, and 
at the next step (C 2 )' would assign one of these two arcs a second label t. 


We turn now to some further necessary and sufficient conditions for N to be tree-based. 

Proposition 3. Consider a binary phylogenetic network N over leaf set X. 

(i) If each vertex of N of in-degree 2 has parents of out-degree 2, then N is tree-based. 

(a) If N has a vertex of in-degree 2 whose parents both have out-degree 1, then N is not 
tree-based. 


Proof. Part (i) was established in the proof of Lemma 1 of Gambette et ah (2015) by an 
elegant application of Hall’s matching theorem for digraphs. For part (ii) note that the two 


7 







parents form an antichain but paths from these parents to leaves both have to go through 
the edge below n, meaning they are not arc-disjoint, thereby violating Proposition!^ □ 


We end this section by showing how Theorem [T] provides a convenient way to verify that 


tree-child networks are tree-based, as are tree-sibling networks (a result stated by van lersel 


(2013) without proof). 

Recall that a network is a tree-child network if every non-leaf vertex is the parent of at 
least one vertex of in-degree 1, while (more generally) a tree-sibling network is a network for 
which every vertex v of in-degree 2 has a sibling v' that has in-degree 1 (i.e. v' is either a 
leaf of has outdegree 2). ‘Sibling’ here means that the vertices share a parent. A network N 
is reticulation visible if every vertex of N of in-degree 2 has the property that for some leaf 
X oi N all paths from the root of to a; pass through v. Tree-child networks are subset of 
the tree-sibling networks, but tree-sibling and reticulation visible represent different classes 
(and one is not a subset of the other). 


Corollary 2. The class of tree-based networks includes tree-child networks (and thus hy¬ 
bridization networks), and, more generally, tree-sibling networks. It also includes the class 
of reticulation visible networks. 


Proof. For tree-sibling networks, for each vertex v of in-degree 2, select exactly one sibling 
v' of V that has in-degree 1, and if p is the parent of v and v' label the arc {p, v) by /. Then 
if S is the set of arcs of N minus the arcs labelled / then S is an admissible subset of arcs 
for N, and so, by Theorem [T| is tree-based. For reticulation-visible networks, Gambette 
et ah (2015) showed that such networks satisfy the condition described above in part (i) of 
Proposition!^ which, in turn, they established suffices for N to be tree-based. □ 

Note that although all tree-sibling networks are tree-based, it is easy to construct an 
example of a tree-based network that is not tree-sibling (an example is provided by 


lersel (2013)) 


van 


An algorithm 

Theorem !^ furnishes a polynomial-time algorithm that takes any binary phylogenetic 
network N and determines whether or not it is tree-based. An extension can also then be 
used to determine a valid support tree for N, and indeed to compute all of these (however 
there may be exponentially many, and even counting the number is unlikely to be easy). 

First we describe a simple test that decides whether or not a network is tree-based, and 
which is based on a well-known criteria for testing the satishability of any instance of 2-SAT 
by taking the transitive closure of the implication relation (we give an example below). 

To present this algorithm it is helpful to restate conditions (Gi) and {C 2 ) in an equivalent 
way, by making two modihcations. Firstly, we will indicate that an arc a is in S' or not in S 
by assigning the arc the label t (=‘true’) and / (‘false’) respectively. Secondly we will state 
the two conditions (Gi) and (G 2 ) in the form of implications (‘if...then’) to show how a label 
assigned to one arc can ‘force’ the assignment of a label to an adjacent arc. 

















(Cl)' when v has in-degree 2, (i) if one of the in-coming arcs has label t then the other 
in-coming is assigned label /, and (ii) if one of the in-coming arcs has label / then the 
other in-coming arc is assigned label t. 

(C 2 )' when V has out-degree 2, if one of the out-going arcs has label / then the other outgoing 
arc is assigned label t. 


Now, let us label each arc in Si by t, and then extend this labelling to other arcs by 
repeated applications rules (Ci)' and (C 2 )' when they apply. It is clear that two things could 
happen: either a single label is assigned to (some or all of) the arcs of N and the rules do 
not assign a label to any further arcs, or else at some point an arc could be assigned a label 
different from the one it has received earlier in the process. In turns out that N is tree-based 
precisely if this latter case does not occur. This is formalized in the following corollary of 
Theorem [T| which is justihed by a well-known algorithm for testing satishability of 2-SAT 
(Krom 1967). 


Corollary 3. N is tree-based if and only if case (i) does not arise under the following 
procedure: Assign all arcs in Si label t and then repeatedly apply conditions {Ci)' and (C 2 )' 
to extend this labelling to other arcs of N, until either (i) an arc is assigned a label different 
from its existing label or (ii) the conditions can no longer be applied. 

Notice that the only arcs that do not receive an immediate label by their membership of 
Si are the pairs of arcs that are incoming to a vertex of in-degree 2. If the label for one of 
these arcs is subsequently determined (by application of the C conditions) then the status 
of the other arc in the pair is hxed by (Ci)'. An example of how this algorithm works is 
provided in Fig In this case the algorithm detects that the network is not tree-based, 
since it leads to the case (i) where an arc is assigned a label different from its existing label. 
Although this algorithm is easy to apply by hand on small examples, for very large networks 
there exist faster (linear time) algorithms for deciding satishability of 2-SAT and these could 


be applied, however these are more technical to describe (Aspvall et al. 1979). 


Suppose now that each arc of N receives at most one label. Then N is tree-based, and if 
every arc of N gets a label then, by Theorem [T| there is unique support tree for N. However 
another possibility is that only some of the arcs of N are assigned a label. In this case 
there exists more than one support tree (though the network may still only be based on one 
possible phylogenetic tree). 

To hnd a support tree for N it suffices to select any arc a that remains unlabelled at 
the end of the process described above, then assign one label {t or /) to a and apply the 
process again of extending the labelling using repeated applications of (Ci)' and (C' 2 )'. We 
can then continue this procedure (selecting an unlabelled arc and extending the labelling so 
far obtained) until all arcs receive a label. The arcs labelled t then correspond to arcs of a 
support tree for N, and the arcs labelled / are the linking arcs. 

It is possible in this way to generate all the possible support trees for N, however there 
may be exponentially many of them, since if N has k vertices of in-degree 2, then the number 
of support trees can be as large as 2^. Even counting the number of support trees may be 
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hard, since counting the number of satisfying solutions of 2-SAT is known to be ^P-complete 
(Valiant 1979). 


Example 

We now provide a simple illustration of how this algorithm works by applying it to a 


phylogenetic network proposed recently by Marcussen et ah (2014) to represent the complex 
hybrid evolution of bread wheat. Our application here is not intended to provide support for 
or against particular claims in that paper. Rather, the purpose is to show how the algorithm 
can be applied to a small but realistic phylogenetic network to determine whether or not it is 
tree-based, and if it is, to illustrate how the tree(s) and linking arcs can be readily identihed. 


(») 





Figure 4: (i) A network from Marcussen et ah (2014) showing three ancient hybridization 
events in the evolution of bread wheat. Corollary [^terminates at (ii) to show that the network 
is tree-based. Finding a particular support tree requires selecting a label for an unlabelled 
arc, extending the labelling using the rules {Ci)' and {C 2 )' repeatedly, and continuing this 
process. In this example, the six unlabelled arcs consist of three pairs, with the arcs in each 
pair incoming to one of the three vertices of in-degree 2. Assigning a label {t or /) to an arc 
in one pair determines the assignment for the other arc in that pair but does not force any 
further arc assignments. Thus three independent assignments can be made for each pair, 
leading to 2^ = 8 choices in total. For the particular choice shown in (iii) we obtain the 
support tree T' shown in (iv) and thereby the tree-based representation shown in (v). In 
this example, the seven rooted binary phylogenetic trees that the network can be based on 
are all distinct. 
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Fig.g i) shows a binary phylogenetic network on hve leaves and three reticulations (ver¬ 
tices of in-degree 2). This network is essentially equivalent to the one shown in Fig. 3 of 

, under the taxon labelling a=Triticum uartu, h= Triticum turgidum, 
d= Aegilops tauschii, e= Aegilops speltoides. 

First notice that the algorithm in Corollary tells us immediately that N is tree-based, 
since the initial labelling of the arcs in Si by t (shown in Fig. |^i)) does not extend further. 
To hnd a support tree we see that assigning t to either of the two arcs arriving at the lowest 
recitulation vertex does not cause any dual-labelled arc to arise. The same holds at the 
other two reticulation vertices, and these choices can all be made independently. Thus there 
are 2^ = 8 eight possible support trees, and in this case the eight associated rooted binary 
phylogenetic trees (that the support trees are subdivisions of) are all distinct. 


Marcussen et ah (2014 


c= Triticum aestivum, 


The trees displayed by a tree-based network 


We have seen that a tree-based network can be based on more than one tree. An obvious 
question then is what one can say about the trees that can act as a base for a given tree- 
based network N. There is a related notion that applies to any binary phylogenetic network 
(tree-based or not), namely the concept of ‘displaying’ a rooted phylogenetic tree, which we 
need to recall hrst. Given a binary phylogenetic network N over A, N is said to display a 
rooted binary phylogenetic tree T if T can be obtained from N by deleting arcs and vertices, 
and suppressing any resulting vertices of in-degree and out-degree equal to 1 (Cordue et ah 


2014). Notice that if N has at most k vertices of in-degree 2, then it can display at most 
2^ trees, and there has been some recent interest in identifying a class of networks for which 
this holds (Willson 2010) or quantifying the extent to which it can fail (Cordue et ah 2014). 
A second active area of interest has involved determining the computational complexity of 
deciding (for various categories of networks) whether a given network N displays a given 
tree T (Gambette et al. 2015 van lersel et al. 2015). 

It is clear that if A is a tree-based network, and is based on T, then N must display T, 
since we can just delete the linking arcs, and suppress any resulting vertices of in-degree and 
out-degree equal to 1. However, it is possible for a tree-based network to display the tree T 
but fail to be based on T (see Fig. [^. In other words, the notion of a network being based 
on a tree is stronger than simply displaying the tree. 

Moreover, it turns out that the set of trees that are displayed by a network that is based 
on T need not bear any relation at all to T; indeed, for each positive integer n, there is a 
tree-based network displaying all trees on n leaves. This network can be based on any tree 
on n leaves as the following result shows (its proof is also in the Appendix): 


Proposition 4. For any n > 2 and any rooted binary phylogenetic X-tree T on n leaves, 
there is a tree-based binary network N over X, based on T and with order linking arcs, 
such that N displays all rooted binary phylogenetic X-trees. 
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Figure 5: A network that is based on a tree ab\c (left) and displays bc\a (right), but is not 
based on the tree bc\a. Notice that in the right-hand network, vertex v requires a linking arc 
to be attached to another linking arc. 


Concluding comments 

We end with some hnal comments. 

1. Establishing that a network is based on a tree T does not necessarily mean that the 
evolution of the taxa under study was primarily represented by T (or, indeed, on any 
rooted tree) with just some additional transfer events (like horizontal gene transfer, or 
endosymbiosis) between branches of the tree. As we have seen, hybridization networks 
can also be tree-based, even though they are described somewhat differently. Rather, 
tree-based means that one can represent evolution using a rooted tree and linking arcs, 
and this does not, in itself, confer or require any particular mechanism of evolution for 
the taxa under study. 

2. Our results suggest a number of further relevant questions. We have seen that a tree- 
based network N can be based on more than one tree. However, given a network, how 
many base trees can it have? 

(a) Is it possible to characterize the set of rooted binary phylogenetic trees on which 
N can be based? 

(b) Given a tree-based network N and an arbitrary rooted binary phylogenetic tree 
T, can it be decided in polynomial time whether or not N is based on T? 

(c) Is it possible that there is a network on a leaf set X that is a tree-based network 
for all trees on XI The answer is “yes” for |N| = 3, as shown in Fig. 

3. The networks we have studied so far are required to be acyclic. However, basing a 
network N on a tree T suggests adopting a possibly stronger condition that relies on 
the assignment of an ordering of the vertices of N to reflect the temporal nature of 
vertical (tree-like) and horizontal (reticulate) evolution. More precisely, suppose that 
N is a network based on T. A map t from the vertices of N to the real numbers (or 
the integers) is then a strong temporal ordering for N relative to a valid support tree 
T' derived from T (i.e. one containing all the vertices of N), provided that t satisfies 
the two properties: 
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(i) If (m,u) is any arc of T', then t{u) < t{y). 

(ii) If (■u,n) is a linking arc, then t{u) = t{v). 


Condition (i) reflects the biological point that vertical evolntion (a lineage persisting 
throngh time pins speciation events) is proceeding with a natnral time scale. Condition 
(ii) captnres the notion that reticnlate evolntion reqnires the two donor species to both 
be extant at some time in the past. However, if we allow for additional species (not 
sampled, or perhaps now extinct, ( Szolldsi et al.j20f5 )) to play a role in evolntion, then 
Condition (ii) needs to be relaxed to the following condition: 


(ii)' 


If {u,v) is a linking arc then t{u) < t{y) 


When N satishes (i) and (ii)', we say that t is a weak temporal ordering for N relative 

to r. 


Notice that the (weak) temporal ordering condition in itself implies that N must be 
acyclic, since if Ui,..., Ufc = Ui is a directed cycle in a tree-based network, then some 
pair of adjacent vertices in the cycle ~ say Vi and Uj+i - forms an arc of the support 
tree and so t{vi) < t{vi+i). However, since the f-values of the vertices in the remainder 
of the path from vi to Vk = vi is non-decreasing (by (i) and (ii)'), this would imply 
that t{vi) < t{vk) = t{vi), which is a contradiction. Fig. [^illustrates three tree-based 
networks that have no strong temporal ordering relative to any valid support tree. 


It turns out that every acyclic network (and thus every tree-based network) has a weak 
temporal ordering. To see this, note that because N is an acyclic directed graph, it 
is possible to order the vertices Vi,V 2 ,... so that if {vi,Vj) is an arc of N then i < j 
(Proposition 1.4.2 of Bang-Jensen and Gutin (2001)). Thus if we let t{vi) = i for 
each i, we obtain a weak temporal ordering for N. In other words, if we accept the 
justihcation for relaxing temporal ordering based on the possible role of unsampled 
or extinct taxa in the reticulate evolution of the extant species under study, then the 
resulting weak temporal ordering constraint does not provide any real restriction on 
the class of tree-based networks. 
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Figure 6: Three networks with no strong temporal ordering relative to any valid support 
tree. Network (i) fails to be acyclic (and so is technically not even a binary phylogenetic 
network), but Networks (ii) and (iii) are acyclic (and so have a weak temporal ordering). 
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Appendix: Mathematical prooes 


Proof of Theorem 

Proof. Part (a) Suppose that N is tree-based, and let T' be a support tree for N. Then the 
set S of arcs of T' contains Si, and S also satishes conditions (Ci) and (C 2 ) for every vertex 
n G E of in-degree or out-degree 2, respectively. Thus S is admissible. 

Conversely, suppose that S is an admissible subset of arcs of N. Consider the network 
N' = {y, S) consisting of all the vertices in N and just the arcs in S. We claim that this is a 
rooted tree, with root p (the root of N) and leaf set X (the leaf set of N). Firstly, notice that 
N' has no vertex of in-degree 2, by condition (Ci). Secondly, every arc a that is incoming 
to a leaf x G X of A is present in S', and so a is also an arc of N' and so the leaf set of N' 
contains X. It remains to check that (1) N' contains no other leaves, and (2) the only vertex 
of in-degree 0 in N' is p. For (1), suppose v is vertex of N' that is not in X. Then in A, v 
has strictly positive out-degree. If v has out-degree 1 in A, then the outgoing arc from v is 
present in Si and thereby in S', while if v has out-degree 2, at least one of the two outgoing 
arcs is present in S by condition (C 2 ). Thus, v cannot be a leaf of A', establishing claim 
(1). Turning to claim (2), suppose that v has in-degree 0. Then either (i)' v has in-degree 1 
or 2 in A, or (ii)' v is the root vertex of A. Now (i)' cannot hold since the admissibility of 
S implies that at least one in-coming arc into v is present in A' (if v has in-degree 1, then 
the incoming arc lies in Si and hence S', while if v has in-degree 2, condition ((Fi) implies 
that one incoming arc into v is present in A'). Case (ii)' must now apply since every hnite 
acyclic network has at least one arc of in-degree 0. This establishes claim (2), and thereby 
the “if’ direction in the first statement of Part (a). 

For the second statement of Part (a), we simply observe that the function S h-)■ (E, S') 
from admissible subsets of A to valid support trees for A is a bijection since it has a (left 
and right) inverse in the opposite direction, namely T' = (E, S') h-)■ S'. 

Part (b) We will show that any rooted phylogenetic network can be translated directly 
into an instance of 2-SAT (a conjunction of clauses, each involving just two literals or their 
negations) in such a way that the existence of an admissible subset of arcs for A corresponds 
to the satishability of the corresponding 2-SAT instance. 

Given A, let the set of literals be the arc set A, and consider the conjunction of the 
following clauses C„: 

Cn-.= /\CaA /\ (a A CO A /\ y, 

aeSl V&Vin-2 weVout-2 

where En-2 (resp. E)ut-2) is the set of vertices of A of in-degree 2 (resp 
for a G Sp. 

Ca 

while for v G En-2 (with incoming arcs a, 6 ): 

Cy = {a y h) and C' = (->0 V - 16 ); (2) 


out-degree 2) and 
( 1 ) 
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and for w G Kut -2 (with outgoing arcs a', b')\ 


Cyj = a' y h'. 


( 3 ) 


Notice that Cat is an instance of 2-SAT (a conjunction of clauses, each of which involves 
just two literals or their negation), and that if we interpret a truth assignment to A as 
indicating whether a G A is an element of S (‘true’) or of A — S' (‘false’) then Cn is 
satishable if and only if N has an admissible subset S (since the three types of clauses 
in ([^-([^ respectively capture the conditions Si C S and conditions (Ci) and (C 2 ) for 
admissibility). 

Part (b) now follows by the equivalence between admissibility and tree-based in part 


a), and the classic result (dating back to Krom (1967)) that any instance of 2-SAT can be 


solved in polynomial time (indeed in linear time by more recent techniques (Aspvall et ah 


1979)). 


□ 


Proof of Proposition 

Proof, [(a) (b)] Suppose that N is tree-based. Then for any tree-based representation 

for N, no two linking arcs can share the same vertex (by considering the various possible 
cases). Moreover, a linking arc is never incoming to an in-degree 1 vertex, or outgoing from 
an out-degree 1 vertex, and so deleting linking arcs will not disconnect the network. Thus if 
we take I to be the linking arcs in any tree-based representation for N then N' = (V, A — I) 
is an associated support tree for N. Conversely, suppose that (b) holds for some set I. Since 
N' is connected it has leaf set X, and so N' is a subdivision of some rooted phylogenetic 
X-tree T. Now if we regard each arc in / as a linking arc then we recover N (since I is 
independent, and N' is connected, these arcs are all placed validly). 

[(a)-v^ (c)] If N is tree-based, then any valid support tree T' for N satishes the conditions 
specihed in (c). Conversely suppose that T is a rooted spanning tree of N (with root p) that 
contains the arcs in Si, and has no leaves outside of X. Since T is a spanning tree it contains 
all vertices of N, and any additional arcs in N are either (i) from a vertex of out-degree 1 
in T to a vertex having in-degree and out-degree equal to 1 in T, or (ii) from a vertex of 
out-degree 0 in T to a vertex having in-degree and out-degree equal to 1 in T; however, case 
(ii) is excluded by the assumption that T has no leaves outside of X. Thus if we let S be 
the set of arcs of T, then S contains Si, and condition (Ci) holds (since T is a tree), and 
condition (C 2 ) also holds from case (i). Thus S is an admissible subset of arcs for N, and so 
N is tree-based. □ 


Proof of Proposition 
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Proof. Suppose N is based on a tree. Then any antichain ^ of iV is also an antichain in 
any support tree T' for N, since removing the linking arcs in returning to T' from N cannot 
create paths between vertices. For each vertex v & A, select a leaf that lies below v 
(i.e. there is a directed path from v to Xy). Since T' is a tree, these |^| are all arc-disjoint. 
Moreover, the reinsertion of the linking arcs in moving from T' to N does not alter the 
arc-disjointness of these |M| paths. 

For the second claim, observe that X is itself an antichain of N of size |X|. Suppose there 
were an antichain M of of size strictly greater than |X|; we will show that this implies 
that N is not tree-based. Let Ai and A 2 be the sets of vertices in A that are leaves and 
non-leaves, respectively. Since |M| > |X| it follows that IM 2 I > 1. Now if there were |M, 2 | 
arc-disjoint paths from A 2 to the leaves of N then these IM. 2 I leaves together with Ai would 
comprise |M,| distinct leaves. Since |M| > |X|, this is not possible, and so A 2 violates the 
antichain-to-leaf property, and hence N is not a tree-based network. □ 


Proof of Proposition 

Proof. Order the leaves of T as Xi,X 2 ,X 3 ,... ,x„. For each xt, consider the pendant arc a* 
of T that is incident with Xj. Place a linking arc from Oj to aj for each pair i,j with i < j. 
Above these (”) arcs, place another set of ( 2 ) linking arcs, again from Oj to Oj for each pair 
i, j with i < j. Continue this process so as to place a total of n — 1 sets of ( 2 ) such collections 
of linking arcs between the pendant arcs of T to obtain a network N based on T containing 
|n(n — 1)^ linking arcs altogether (see Fig. [^. 



level n — 1 


level 2 
level 1 


Figure 7: A tree-based binary network that displays all rooted binary phylogenetic X-trees 
with n leaves. 


We claim that N displays all rooted binary phylogenetic X-trees. To see this, note that 
any rooted binary phylogenetic X-tree T' can be constructed by a sequence of n — 1 steps 
of a ‘coalescent’ process which starts with a graph of n isolated leaves. At each step this 
process joins two elements of the graph so-far constructed to a root vertex (the number of 
components of the resulting forest decreases by 1 at each step, and so we arrive at a tree 
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after n — 1 steps) - for an example of this coalescent process, see Fig. 2.8 of Semple and Steel 


(2003). The generons placement of the linking arcs in N allows for this coalescent process 
to be realised (for any tree T') in N. □ 
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